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Genome evolution predicts genetic interactions in 
protein complexes and reveals cancer drug targets 

Xiaowen lu\ Philip R. Kensche\ Martijn A. Huynen^'^ & Richard A. Notebaart^'^ 



Genetic interactions reveal insights into cellular function and can be used to identify drug 
targets. Here we construct a new model to predict negative genetic interactions in protein 
complexes by exploiting the evolutionary history of genes in parallel converging pathways in 
metabolism. We evaluate our model with protein complexes of Saccharomyces cerevisiae and 
show that the predicted protein pairs more frequently have a negative genetic interaction 
than random proteins from the same complex. Furthermore, we apply our model to human 
protein complexes to predict novel cancer drug targets, and identify 20 candidate targets with 
empirical support and 10 novel targets amenable to further experimental validation. Our study 
illustrates that negative genetic interactions can be predicted by systematically exploring 
genome evolution, and that this is useful to identify novel anti-cancer drug targets. 
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Knowledge of how proteins interact with each other to exert 
their function is crucial for understanding how disruption 
of interactions can lead to disease ^'^ and in the 
development of treatments. In recent years, several system-level 
maps of protein complexes have been constructed from physical 
interaction data^~^ to initialize understanding of the functional 
relationships between proteins. These studies are important in 
identif)^ing which proteins are linked to each other in biological 
processes . Yet, these maps do not directly reveal how the 
proteins interact with each other. More specifically, they do not 
provide information about whether the interaction between two 
proteins is symmetric, in which both proteins are equally 
important in the function of a protein complex, or whether the 
interaction is asymmetric, in which one protein can function in 
the absence of the other protein, but not vice versa (Fig. la). One 
example of such asymmetry is the cyclin-Cdc28 complex where 
the function of the cyclin, Clnlp, depends on the Cdc28 kinase, 
but not vice versa. The function of Clnlp depends on Cdc28p as 
transcriptional activation of CLNl requires an active Cdc28 
kinase^. The function of Cdc28p, however, does not depend on 



Clnlp as the presence of Cln2p compensates for Clnlp's absence 
to activate Cdc28p (ref. 8). Thus, there is a functional asymmetry 
between Clnlp and Cdc28p, where Clnlp depends on Cdc28p 
and not vice versa (Clnlp ^Cdc28p). Similarly, there is 
asymmetry between Cln2p and Cdc28p (Cln2p ^Cdc28p). This 
example shows the relationship between functional asymmetry 
and what is called a negative genetic interaction, where mutations 
(for example, knockout) of two genes (for example, CLNl and 
CLN2) reduce the fitness much more strongly than would be 
expected based on the decline in fitness of each gene individually^. 
The concept of negative genetic interactions is very valuable in the 
development of therapeutic treatments for diseases that can be 
treated by selectively depleting cells with a disease- causing 
mutation. Especially promising are the discoveries of cancer 
drugs that target proteins having synthetic lethal interactions with 
mutated oncogenes or tumour-suppressor genes^^'^^ The 
mechanism underlying this treatment is that inhibiting these 
genes separately is relatively harmless in a normal cell, while it is 
lethal to a cancer cell as it causes a lethal double mutant with 
mutated oncogenes/tumour- suppressor genes. Thus, targeting 
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Figure 1 | Functional asymmetry and negative genetic interactions are linked, (a) Protein A and B have an asymmetric functional relationship, where 
the function of A depends on B but not wee versa. The asymmetry between protein A and B can be owing to the presence of protein C, which can 
compensate for a mutant of A. In such a scenario, proteins A and C are predicted to have a negative genetic interaction, (b) Functional asymmetry between 
enzyme A and B involved in a branched pathway in a metabolic network (A depends on B, but not wee versa: A^B). Nodes and arrows represent 
metabolites and reactions, respectively. The asymmetric relationship is owing to a converging reaction catalysed by enzyme C, which can compensate for 
As absence. Thus, enzyme A and C are likely to have a negative genetic interaction, (c) Flowchart to predict negative genetic interactions from genome 
evolution within a three-member protein complex (protein A, B and C). The blue arrow represents the functional asymmetry between two genes inferred 
from genome evolution. The blue line represents that there is no evolutionary evidence for a functional asymmetry between two genes. Here, both gene 
A and C are predicted to have functional asymmetry with B, while gene A and C are predicted not to have functional asymmetry. 
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these genes can kill the cancer cells while leaving the normal cells 
relatively unaffected. Prioritizing drug targets in such an approach, 
by predicting negative genetic interactions between the mutated 
oncogenes/ tumour-suppressor genes and other proteins, is, 
however, not trivial. 

Discovering negative genetic interactions mainly depends on 
laborious and specific experiments, which can be expensive and 
time-consuming, partially because of the explosion of the number 
of pairwise gene combinations. Several computational approaches 
have been developed to predict genetic interactions by integrating 
multiple types of functional genomic data, such as synthetic 
lethalitv data, physical interaction data and co-expression 
data^^' . These approaches, however, strongly depend on 
species-specific empirical genetic interaction data as input and 
therefore do not allow predictions for other species where genetic 
interaction data are largely unavailable. 

Here, we aim to predict negative genetic interactions in protein 
complexes via the concept of functional asymmetry, which we 
infer from genome evolution. The example of the cyclin-Cdc28 
complex showed that functional asymmetry and negative genetic 
interaction are linked. This linkage can also be illustrated by 
enzyme relationships in metabolism. Enzymes in converging 
pathways have asymmetric relationships with an enzyme in an 
outgoing pathway^"*'^^ (Fig. lb). Consequently, the enzymes in 
converging pathways can have negative genetic interactions, as 
they can compensate for each other's absence. Analogous to 
metabolism, we expect that two proteins in a complex with 
asymmetry to a third protein will have a negative genetic 
interaction (Fig. la,c). Importantly, the functional asymmetry 
between enzymes in metabolism is indeed reflected in genome 
evolution^^'^^. 

On the basis of these prior studies, we have developed an 
evolutionary model to predict asymmetric functional relation- 
ships and negative genetic interactions in protein complexes in 
S. cerevisiae . Our model predicts that almost 75% of the protein 
complexes in S. cerevisiae contain functionally asymmetric protein 
pairs. By integrating the information of predicted asymmetry in 
protein complexes, we show an up to twofold increase in the 
predictive power for negative genetic interactions relative to 
randomly chosen protein pairs from a complex. Moreover, our 
results show a twofold increase in prediction precision compared 
with an alternative model^^. After mapping negative genetic 
interaction predictions from yeast to human, as well as a direct 
application to human protein complexes, we predict 20 cancer 
drug targets with empirical support and 10 completely novel 
targets not yet experimentally examined. Our study shows that 
higher-order functional relationships can be predicted by 
systematically exploring genome evolution, thereby providing a 
framework to interpret protein complex function with broad 
application to medical genetics. 

Results 

Functional asymmetry occurs frequently in protein complexes. 

In order to examine if patterns in genome evolution can be used 
to predict negative genetic interactions, we first predicted asym- 
metry between protein pairs (A-B) in protein complexes from 
evolutionary analysis. We constructed a model integrating 11 
evolutionary variables from the reconstructed ancestral states on 
a phylogenetic tree of 373 species (Fig. 2 and Methods). For 
instance, evolutionary asymmetry between proteins A and B is 
inferred from the occurrence of multiple evolutionary loss events 
where only one of the two genes was lost in the descendant while 
both genes were present in the ancestor. If A is more frequently 
lost than B, then A is expected to be functionally dependent on B 
(A^B, see Fig. 2a,b, scenario /i). The model was trained on a set 



of functionally asymmetric enzyme pairs in the genome -scale 
metabolic network of S. cerevisiae^^ to predict the dependency 
between the two enzymes (see Methods for detail). 
Using a tenfold cross-validation, the model showed a correct 
classification rate of 64.4% and an area under the receiver 
operator characteristic curve of 0.7 (see Methods for details; 
Supplementary Fig. SI). It should be noted that the performance 
of the model becomes worse when using a simpler model with 
only single gain and loss events (Supplementary Fig. SI). We next 
asked how frequently functional asymmetry occurs in empirically 
determined protein complexes from S. cerevisiae^^ . Our analysis 
predicts that 71% (6,145 out of 8,71 1) of the protein pairs in these 
protein complexes are functionally asymmetric. Furthermore, 
~75% (307 out of 409) of the protein complexes are found to 
have at least one predicted functionally asymmetric protein pair 
(Supplementary Fig. S2). 

Empirical evidence for functional asymmetry. The high 
frequency of predicted functional asymmetry triggers the ques- 
tion to what extent our predictions are biologically meaningful. 
To answer this question, we asked whether predicted asymmetry 
is reflected in genome-scale empirical data. We first examined 
asymmetry in gene essentiality. For a predicted functionally 
asymmetric pair (A^B) where only one of the proteins is 
essential, we expect protein B to be the essential one. To test this, 
we examined those predicted asymmetric pairs where only one of 
the encoding proteins is essential and quantified to what extent 
the predicted asymmetry is consistent with asymmetry in gene 
essentiality. Our analysis reveals that 72% (1,071 out of 1,497) of 
the asymmetric pairs (A^B) are consistent with asymmetry in 
gene essentiality, that is, if one of the two genes is essential, it is B 
(Fig. 3a; one-tailed Fisher's exact test; P<2.2e— 16). 

Many predicted asymmetric pairs have, however, no asym- 
metry in gene essentiality, simply because the majority of proteins 
are not strictly essential under standard laboratory conditions 
(that is, glucose-rich medium)"^. Approximately 40% of the 
predicted asymmetric pairs are composed of two non-essential 
genes. Nevertheless, even when proteins are non-essential they 
can still contribute to fitness and result in significant growth 
defects after a gene knockout^^'^^ Similarly to the gene 
essentiality analysis, we expected that if one of the two proteins 
in a predicted asymmetric pair (A B; both non-essential) has a 
stronger growth defect, it would be protein B. To test this, we 
examined 511 predicted asymmetric pairs where two non- 
essential proteins cause different growth defects, and quantified 
the consistency between the predicted and empirical functional 
asymmetry. As expected, 63% (322 out of 511) of the cases are 
consistent (Fig. 3a; one-tailed Fisher's exact test; P— 1.40e — 04), 
that is, the predicted independent protein B has a stronger growth 
defect when knocked out. This consistency is robust at various 
cutoffs on empirical growth defect differences (ranging from 
0.0001-0.3). Notably, the level of consistency increased from 63 
to 67% when we increased the cutoff to 0.3, that is, where growth 
defect difference is largest. These results show that the 
evolutionary model is able to capture asymmetric functional 
relationships in protein complexes. ESCRT-I, a protein complex 
functioning in cargo selection in the multivesicular body (MVB) 
sorting pathway, is one example where the predicted functional 
asymmetry is supported by solid empirical evidence (Fig. 3b). 

Empirical evidence for negative genetic interactions. An 

asymmetric functional relationship observed between two pro- 
teins A and B can also be linked to a type of relationship known 
as a negative genetic interaction^. The reasoning behind this is 
that B may not depend on the presence of A, because of the 
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Figure 2 | Evolutionary variables to predict functional asymmetry, (a) Depiction of an evolutionary event. It shows a loss of gene A in the descendant (d) 
when both A and B are present in the ancestor (a), (b) Evolutionary variables used for the Bayesian classifier. The evolutionary variables across history of 
the dependent protein A in an asymmetric pair (A^ B) can be expressed via 11 measures, listed in the first columns of two boxes. Of these 11, 6 count the 
number of times that a dependent protein A is gained or lost independently of B, including 2 that are expected to occur less often and 4 that are expected 
to occur more often for a dependent protein than for an independent protein. The remaining five measures are fractions that express relative frequencies of 
two evolutionary scenarios between A and B. f-\, (2, h and [4 are expected to be larger than 0.5; fs, that weighs U with the absolute difference between two 
gain events of A, is expected to be larger than 0. The six evolutionary and the five relative frequencies all contribute to the prediction of functional 
asymmetry (Supplementary Fig. SI), (c) Schematic presentation of predicting the functional asymmetry in a protein pair (A-B). A TAN classifier was trained 
on functionally asymmetric enzyme pairs in the metabolic network^^. For a protein pair (A-B), the 11 evolutionary variables (Fig. 2b) are used to predict the 
conditional probability that A is functionally independent (ai) or dependent (a2). A is likely to be independent if ai>a2, or dependent if ai<a2. 
Subsequently, A can be predicted as independent, dependent or unclassified by applying a cutoff on probability (a = max(ai^ (X2), see Methods). The same 
procedure is applied to protein B. By combining the prediction of both, A-B can be either A^B where A is dependent and B is independent, B^A where A 
is independent and B is dependent, or no evidence for asymmetry when A and B have any other combination of predicted relationship. 



presence of another protein C that compensates for A's absence. 
We asked whether our evolutionary model is capable of 
predicting such negative genetic interactions in multi-member 
protein complexes. We specifically focused on triplets where 
protein A, B and C are predicted to have an evolutionary fan-in 
motif (Fig. la). This motif is characterized by (i) A and C depend 
on B, but not vice versa and (ii) A does not depend on C and vice 
versa. This motif is analogous to converging pathways in 
metabolism, in which there is a mechanistic explanation for 
the compensatory effect between A and C (Fig. lb). We there- 
fore first investigated to what extent converging metaboHc 
pathways ^^'^^ show negative genetic interactions as a proxy 



for compensation, that is, double mutants cause more severe 
growth defects than expected from the two single mutants. 
Indeed, enzyme pairs in converging pathways show a threefold 
enrichment in negative genetic interactions compared with non- 
converging enzyme pairs (Fig. 3c; one-tailed Fisher's exact test; 
P = 2.42e- 11). 

Given this result, we expect that in cases where we predict a 
fan-in motif within a protein complex, the A and C proteins have 
negative genetic interactions with each other. To address this, we 
compared fan- in A-C pairs with randomly chosen pairs from the 
same protein complex (that is, non-motif A-C pairs). Our results 
reveal that the fraction of negative genetic interactions increases 
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Figure 3 | Asymmetric functional relationships and negative genetic interactions, (a) Empirical support for predicted functional asynnnnetry in gene 
essentiality^^ and, for non-essential genes, in the growth defect of single-gene knockout^^'^l For predicted asymmetric pairs (A-^B), the fraction 
(foi = "oi/("oi + "lo))/ where 0 = non-essential or no fitness defect and 1 = essential or substantial growth defect is expected to be larger than 0.5 (fraction 
in pairs with no asymmetry), (b) One example of predicted functional asymmetry with experimental support. ESCRT-I is composed of four members, 
Vps23p, Vps28p, Vps37p and Mvb12p. Mvb12p (blue subunit) is predicted to functionally dependent on Vps23p, Vps28p and Vps37p (red) and not wee 
versa. Structural studies revealed that the functionally dependent subunit, Mvb12p, is a structural stabilizer, which changes the ESCRT-I core complex 
(Vps23p, Vps28p and Vps37p) from a fan-shaped structure (upper panel) to an elongated structure (lower panel)^^. First, the dependency of Mvb12 on the 
core complex is supported by the fact that Mvb12p is unstable in cells lacking any of the other ESCRT-I subunits^^. Second, that Vps23p, Vps29p and 
Vps37p do not strictly depend on Mvb12p is validated by the findings that certain MVB sorting pathways, such as carboxypeptidase S (CPS) and Ste2 
sorting, are effective even if Mvb12p is absent^^'^^. However, loss of function of Vps23p, Vps28p or Vps37p results in a complete block of the MVB 
pathway^^. (c) Enrichment of negative genetic interactions in fan-in motif A-C pairs. The fraction of negative genetic interaction is defined as 
f=^neg/(Wneg + ^pos + ^nointeraction)- P'Values In (a) and (c) Were calculated with a one-tailed Fisher's exact test and visualized with an asterisk (*) 
that stands for a P-value <0.05. 



by 50% when applying the evolutionary motif (from 23.5-35.8%; 
one-tailed Fisher's exact test; P = 0.00085; Fig. 3c). It should be 
noted that this enrichment of negative genetic interactions is not 
owing to the functional complementation by homologous genes 
resulting from intra- complex gene duplications, as the results 
are hardly affected by removing homologous gene pairs (from 
22.7-33.0%; one-tailed Fisher's exact test; P = 0.012; Fig. 3c). As 
protein complexes are not necessarily active in standard 
laboratory conditions where genetic interactions have been 
measured, we expected that our test underestimates the predictive 
power. Indeed, when selecting for only protein complexes that are 
active in the cell (that is, B is essential) the predictive power even 
doubles (from 24.5-49.2%; one-tailed Fisher's exact test; P = 4.39 
e — 05; Fig. 3c). Thus, the evolutionary model can predict protein 
pairs that have a negative genetic interaction significantly and 
substantially more often than random pairs from protein 
complexes. To further assess the performance of our model, we 
compared the prediction precision and sensitivity of it with a 
model by Pandey et al}^ by mapping their results to the protein 
complexes. This model has the highest known prediction 
accuracy and is also independent on genetic interaction 
information as input. Interestingly, our model has a twofold 
higher prediction precision (precision — TruePositives/ 
TruePositives + FalsePositives: 0.36 versus 0.18; sensitivity— 
TruePositives/TruePositives + FalseNegatives: 0.47 versus 0.78). 



Notably, our model still has a precision of >0.3 when 
parameterized to the same sensitivity as Pandey's model 
(Supplementary Fig. S3). 

In total, our model predicted 273 A-C pairs in evolutionary 
fan-in motifs in S. cerevisiae protein complexes, for most of which 
(60%) a genetic interaction has not been measured. However, to 
provide empirical evidence for the predicted negative interac- 
tions, we exploited available genetic interaction data in Schizo- 
saccharomyces pombe and Drosophila melanogaster via orthology 
definitions from STRING7.0 (ref. 29). Following this approach, 
we found that for ten out of our A-C pairs a genetic interaction 
has been experimentally found in either in S. pombe or in 
D. melanogaster and, as expected, most cases (8/10) show a nega- 
tive genetic interaction in those species (Supplementary Data 1). 

Negative genetic interactions reveal cancer drug targets. The 

screen for negative genetic interactions has been shown to be a 
valuable strategy in the search for candidate cancer drug tar- 
getsiO'^o. The common approach is to find proteins that have a 
negative genetic interaction with either an oncogene or a tumour- 
suppressor gene. As mutations in these genes cause cancer, the 
idea is that mutations in their negative genetic interaction partner 
would inhibit cancer cells to grow (that is, synthetic lethality) and 
leave normal cells relatively viable. Although a number of 
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promising examples have been reported to target cancer cells via 
synthetic lethality discovering genetic interactions by 
experimental approaches is very labour intensive. Therefore, we 
asked whether our model captures conserved genetic interactions 
between S.cerevisiae and other species like D. melanogaster, such 
that it can serve as a framework to establish genetic interactions 
for medical genetics. Based on the orthologue profiles from 
STRING7.0 (ref. 29), we found 90.1% (246 out of 273) of the 
predicted fan-in motifs in yeast are conserved in D. melanogaster, 
that is, all three genes in a fan- in motif are present in 
D. melanogaster. Of these, nine have been examined for genetic 
interaction in D. melanogaster'^ , and all show negative genetic 
interaction (note, 7/9 have also been found in S.cerevisiae). This 
suggests that our model can be used to predict negative genetic 
interactions in other species, such as mouse or human, which 
could provide a basis for prioritizing drug targets. 

To achieve the highest coverage, we combined two strategies to 
predict cancer drug targets: (i) by using orthology mapping from 
predicted negative interactions from yeast to human and (ii) by a 
direct application of our model to human protein complexes^"*'^^. 
By using orthologues of the genes in the predicted fan-in motifs 
in yeast, we predicted ~250 novel negative genetic interaction 
pairs in human, of which 36 involve a cancer- related gene 
(oncogene or tumour-suppressor gene)^^. Notably, most of these 
pairs (83.4%, 30/36) have not been reported before as negative 
genetic interactions. To apply our method directly on human 
protein complexes^"*'^^, we first trained our model with functional 
asymmetric enzyme pairs from the human genome-scale 
metabolic network^^ (see Methods). Then we used the model to 
predict negative genetic interaction pairs. Totally, we predicted 
1,012 gene pairs with negative genetic interactions of which 57 
involve a cancer- related gene. Thus, totally, we predicted 93 cases 
with cancer- related genes. The genes that have a negative genetic 
interaction with these cancer- related genes are potential drug 



targets if they are essential in cancer cells while non-essential in 
normal cells. By examining gene essentiality in cancer cells^^ and 
non- essentiality in normal cells of Mus musculus (mouse), 
D. melanogaster or Danio rerio (zebrafish) (Supplementary 
Data 2), we found that 30 out of the 93 involve promising 
cancer drug targets (pink column in Fig. 4; Supplementary Data 
2). Among these 30 prioritized targets, 20 have been found to be 
essential in at least one cancer type, that is, breast, ovary or 
pancreas, and non-essential in a model organism (pink column in 
Fig. 4). Interestingly, most of these predicted targets (16/20) are 
essential in the exact cancer type where their negative genetic 
interaction partners, the cancer-related genes, are reported to be 
mutated or overexpressed (blue column in Fig. 4; Supplementary 
Data 2). This empirical evidence suggests that the lethality of the 
cancer cells by knockdown of the predicted target gene is actually 
caused by a lethal double mutant of the predicted targets and the 
cancer- related gene. As expected, when considering only those 
cases for which essentiality is measured in cancer cells (that is, 
60 + 18 cases, green column in Fig. 4), the predicted targets 
are more likely to be essential in cancer and non-essential 
in normal cells compared with non-motif gene pairs 
(fractionE_cancer + NE_normai = 0.26 versus 0.12, oue-tailed Fisher's 
exact test; P = 4.8e — 04). The remaining ten targets have not yet 
been experimentally examined for the essentiality in cancer cells: 
six are non-essential in M. musculus or D. melanogaster and four 
have not yet been measured (pink column in Fig. 4; 
Supplementary Data 2). Thus, we predicted 30 potential drug 
targets in total, that is, 20 with empirical support of essentiality in 
cancer cells and 10 novel ones (Table 1 and Supplementary 
Data 2). To give an example, one of the predicted drug targets, 
TLEU has a negative genetic interaction with the cancer-related 
gene, HDACl (ref 39). HDACl failed to be a direct drug target as 
the inhibition of mouse Hdacl caused embryonic lethality in 
normal development"*^. TLEl, on the other hand, has been found 



neg 



Essentiality profile 
for C in cancer cells 



B 



A: mutated oncogene/ 
tumour suppressor gene 
C: predicted target 



93 Predicted 
targets 




60: E in cancer 



18: NE in cancer 



15: Unknown 





Essentiality profile 
for C in normal cells 



40: E in normal cell 

20: NE in normal cell 

6: NE in normal cell 

4: Unknown 

5: E in normal cell 



Mutation information for A in cancer cells 




6: A mutated in the same cancer cell line 
10: A mutated in the same cancer type 
4: Unknown 



E = Essential 

NE = Non-Essential 



Figure 4 | Prioritized cancer drug targets with empirical support. Thirty prioritized cancer drug targets. The numbers in red represent the 30 promising 
targets. The numbers in grey represent the genes that failed to be targets as they are either non-essential in cancer cells or are essential in normal cells. 
In total, 93 genes are predicted to have a negative genetic interaction with a cancer-related gene. By examining gene essentiality in cancer and non- 
essentiality in normal cells, we prioritized 30 cancer drug targets. Twenty cases are essential in at least one cancer type, that is, breast, ovary or pancreas 
and non-essential in normal cells. Experimentally detected genetic variation/overexpression of the cancer-related gene and the RNA interference of the 
predicted targets in the same cancer type or the same cancer cell line were combined suggesting a double mutant. For most of these (16/20), there is 
empirical evidence that the cancer-related gene is mutated in either the same cancer type or the same cancer cell line as the predicted targets (blue 
column). For six cases that are confirmed to be non-essential in normal cells, the essentiality in cancer still needs to be examined experimentally. For four 
cases, the essentiality in both cancer cells and normal cells is not yet measured (red numbers in the pink column). 
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Table 1 | List of predicted cancer drug targets. 

Cancer-related genes Predicted target genes with empirical 

evidence: essential in cancer cell 
non-essential in normal cell 


Novel predicted target genes: ^unknown essentiality in normal 
II J II J.* 1 * 1 II J 
cell and cancer cell **non-essential in normal cell and 

unknown essentiality in cancer cell 


DDB2\FBXW7\EML4 


LSMDl 


PPIB\PPIC\PPID** 




HDAC5\HDAC6 


NTG1\GNL3L\SSTR5''-' 




PPWD1\PPIH\PPIE 






KIAA0564 






NSUN2\NSUN5 




BCL3\NOTCH2\CDKN2C\NFKB2 


HDAC5\HDAC6 


PP\B\PP\C\PPID'^'^ 




PPWD1\PPIH\PPIE 




M5I2 


DENR 




CREBBP\EP300\BRD4\ 


USP12\USP22\USP30 


USP45'^* 


PBRM1\BRD3\TRI-M24\SMARCA4 






PIK3CA 


USP12\USP22\USP30 


USP45'''' 


HDACl ^ 


^^^^^^^ 71^7 ^^^^ 


^^^^^^^^^^^^ ^^^^^^^^^^^^ 


DDX5 1 DDX6 \DDX10\ EIF4A2 




^^^^^^^^^B ATAD2* fl^^^^^^^^H 


PBRMl 


ACTG1 




MED12 


MED14 




nib 1 In4l 


H3F3A 




EZH2 


FED 




RPNl 


MAGTl 




PIK3R1 


ESR1 




SMAD4 


JUND 




CCNEl 


SKP1 




CREBBP 


KAT2B 




FANCF 




ClZorfZO* 


FANCC 




FANCB' 


FANCF 




FANCB-" 


SEPT9 




SEPTir-" 


Twenty predicted cancer drug targets with experimental evidence (second column) and ten novel targets (third column). The grey-shaded rows show targets predicted by using orthology mapping and the 
rows without show targets predicted directly from human protein complexes. Each grey-shaded cell represents a Clusters of Orthologous Group (COG) of cancer-related genes or predicted drug targets. 
The first column shows the cancer-related genes. The second column shows 20 predicted drug targets with empirical evidence where each gene is essential in at least one cancer cell and non-essential in 
model organisms. The third column shows ten novel targets for which the essentiality in cancer cells has not yet been measured. 



to be essential in several types of cancer cells^^ and non-essential 
in D. melanogaster^^ . In addition, TLEl and HDACl have a 
confirmed negative genetic interaction in D. melanogaster^^ , 
which makes TLEl a promising drug target for further analysis. 

Another example is the predicted interaction between NSUN2 
and FBXW7 (Fig. 5). NSUN2 is experimentally found to be 
essential in cancer cells^^, and FBXW7, a tumour -suppressor 
gene, has been found to be mutated in cancer cells'*^. In normal 
cells, NSUN2 and FBXW7 both function to regulate cellular 
differentiation via two different mechanisms. FBXW7 regulates 
cell differentiation by inhibiting c-Myc"*"* and proteins in Notch 
pathway"*^, and NSUN2 functions to maintain normal cell 
differentiation when activated by LEFl/|3-catenin complex, 
which is part of Wnt pathway"*^'"* . It has been found that the 
loss of FBXW7 results in elevated expression of c-Myc"*^, which 
results in an upregulation of NSUN2 (ref. 48). As a result, NSUN2 
stabilizes the mitotic spindle in fast cell proliferation in cancer cell 
growth"*^. Thus, targeting NSUN2 can kill cancer cells while 
leaving normal cells relatively unaffected, owing to the 
compensatory FBXW7-Notch pathway. This is further 
supported by the non- essentiality of NSUN2 in normal mouse 
model"*^. 



Discussion 

Even though various experimental techniques are available to 
study protein function, understanding their functions within 
protein complexes and their relationships between each other in a 
complex remains a challenge. Here, we have developed a model to 
predict functional relationships within protein complexes using 
the evolutionary history of genomes in terms of gene gain and 



loss events. We first focused on the relationships in which the 
function of one protein A depends on the function of another 
protein B, but in which the reverse relationship is much weaker. 
We predicted such asymmetry by integrating various evolu- 
tionary scenarios, such as, gene B is more frequently gained 
across evolutionary history in the absence of gene A than vice 
versa. So, B can occur without A, but A cannot occur without 
B. To validate the method, we showed that the predicted 
functional asymmetry is consistent with various sources of 
empirical evidence, such as asymmetry in gene essentiality and 
single-knockout growth defects. However, one-third of the gene 
pairs in the same complex with asymmetry in gene essentiality 
(633/2,130) have not been captured by our model. Perfect 
prediction can, however, not be expected, because it is strictly 
based on complete gene loss and gain. Our model may therefore 
benefit from more fine- scale evolutionary events, such as the 
incorporation of mutation rates. Thus, in species where both A 
and B are still present, there might be asymmetry in the extent to 
which they have diverged at the sequence level from their 
respective ancestors. Second, the essentiality and growth defect 
data sets"*'"*^ cover specific nutrient environments, which might 
not be experienced by species in our evolutionary model. Such 
differences in physiological conditions may result in different 
genome evolution and, as such, it could negatively affect model 
predictions. 

Interestingly, functional asymmetric relationships can be used 
to predict negative genetic interactions in those cases where 
multiple proteins, for instance, A and C, have asymmetry with the 
same protein B. The underlying idea is that functional asymmetry 
is owing to the fact that A and C can compensate for each other's 
absence. Indeed, we have shown that our model increases the 



NATURE COMMUNICATIONS | 4:2124 | DOI: 10.1038/ncomms3124 | www.nature.com/naturecommunications 

© 2013 Macmillan Publishers Limited. All rights reserved. 



7 



ARTICLE 



NATURE COMMUNICATIONS | DPI: 10.1038/ncomms3124 



Normal cell Cancer cell 




Normal cell Cancer cell 




No lethal phenotype 



Figure 5 | A cancer drug target revealed by predicted negative genetic interactions. NSUN2 is predicted to have a negative genetic interaction with 
FBXW7. Targeting NSUN2 can kill the cancer cells while leaving the normal cell relatively unaffected. In nornnal cells, the tunnour-suppressor gene 
FBXW7 has functional redundancy with N5UN2 in regulating cellular differentiation. In cancer cells, loss of function of FBXW7 results in an elevated 
expression of c-Myc. Activation of c-Myc results in the upregulation of NSUN2 that is essential for cell proliferation. Owing to a synthetic lethality between 
FBXW7 and NSUN2, targeting NSUN2 kills cancer cells, while leaving nornnal cells relatively unaffacted. A solid line represents an active protein, for 
example, FBXW7 inhibiting the accumulation of N0TCH1 and N0TCH3 in normals cells^^. A red cross illustrates the disruption of the function of a protein, 
that is, transcriptional regulation of NSUN2 by c-Myc2 is repressed in normal cell, or the disruption of a cellular function, that is, differentiation or 
proliferation. 



predictive power for negative genetic interactions in S. cerevisiae 
protein complexes by 50-100% relative to random protein pairs 
from the same complex. Moreover, we predict many novel genetic 
interactions, of which eight have been experimentally verified in 
S. pombe and D. melanogaster. Although this number is not high, 
this is not unexpected given that genome-scale genetic interaction 
screens have only been performed in S. cerevisiae^^ and 
S. pombe^^. Moreover, only one screen under standard nutrient 
condition has been conducted for these two species, respectively, 
and it remains to be seen how genetic interactions vary across 
other nutrient conditions. It would therefore be interesting to 
apply our model to study the variation of interactions across 
conditions, for example, to re- evaluate evolutionary theories that 
are based on adaptive landscapes (for example, robustness against 
mutations in single environments) initialized by studies on 
metabolism^ ^ One strategy would be to integrate gene expression 
profiles across a wide range of nutrient conditions with protein 
complex data to remove unexpressed subunits from protein 
complexes and thus obtain condition-specific complexes. In 
addition, condition-specific essentiality profiles'* could be used to 
constrain the independent protein (B) to be essential. As a result, 
one could predict fan- in motifs that are specific for a given 
condition. Our finding that the predictive power of negative 
genetic interactions doubles when incorporating essentiality 
supports such a strategy. 

Various recent studies have stated that the discovery of 
(negative) genetic interactions is a very important step towards a 

8 



full understanding of the genetic basis of complex diseases and 
providing a framework to discover drug targets^ ^^'*. Even though 
there are a number of very promising candidate drug targets 
discovered by using genetic interactions*^'* ^ effectively 
identifying them experimentally remains a major challenge. In 
contrast to other prediction models *^'*^'*^, our model does not 
depend on genetic interaction screens, which are available in a 
very limited number of species. Thus, our model can be applied to 
predict genetic interactions in species even when no empirical 
genetic interaction information is available. Moreover, compared 
with Pandey et a/.*^, our model has a significantly higher 
prediction precision. Our approach is therefore useful to 
prioritize drug candidates. We found 20 cancer drug targets for 
which there is empirical evidence that they are essential in 
tumour cells and predicted ten novel drug targets. Notably, the 
majority (25/30) has not been measured in yeast or predicted by 
other computational approaches* ^'*^. These targets are promising 
given the empirical evidence that all are essential in at least one 
cancer type, such as breast, pancreas or ovarian cancer, while 
non-essential in model organisms. Future studies should reveal to 
what extent these targets are (i) non-essential in the same human 
healthy cell type, (ii) conserved across different tumour types and 
(iii) vulnerable for drug compounds. 

In the light of medical genetics, the presented approach can be 
applied to mammals when future protein-protein interaction 
networks become available for cancer and normal cell lines. This 
would allow for more specific predictions of cancer drug targets. 
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It should be noted that although we focused on protein 
complexes, our model might be used to predict interactions for 
any gene pair, as long as they are known to have a functional 
relationship between them. Taken together, this study shows that 
negative genetic interactions in protein complexes can be 
predicted by genome evolution, which has an application in 
searching for drug targets and in understanding human diseases. 

Methods 

Protein complexes. Four hundred and nine and 2,468 protein complexes of 
S. cerevisiae^^ and human^'^'^^ were used to predict negative genetic interactions 
within them, respectively. For each protein complex, all pair-wise combinations 
of proteins were generated as input for our evolutionary model. 

Reconstruction of ancestral states. We used the presence and absence of 
orthologous genes across 373 species obtained from STRING 7.0 orthologous 
groups^^ to reconstruct ancestral states of genes. We inferred the most 
parsimonious ancestral presence/absence states of each gene by using a rooted 
trifurcation (Achaea/Eukaryote/Eubacteria) phylogenetic tree of 373 species. All 
results were obtained using a gain/loss cost ratio of 2/1 and a delayed transition 
assumption (DELTRAN) in PAUP^^. From the ancestral state reconstruction, we 
generated 11 evolutionary variables as inputs to the following classification step. 
The integration of these 1 1 evolutionary variables gives the highest correct 
classification rate compared with alternative integrations of evolutionary variables. 

For each gene pair A-B, we examined the following six evolutionary scenarios: 
(i) both genes were absent in the ancestor (a) and one was gained in the descendant 
(d) (aOO_dlO or aOO_d01), (ii) the presence of only one gene in the ancestor was 
maintained in the descendant (alO_dlO or a01_d01), (iii) both genes were present 
in the ancestor and one was lost in the descendant (all_d01 or all_dlO), (iv) a 
gain of one gene occurred when the other was present in the ancestor (a01_dll 
or alO_dll), (v) only one gene was present in the ancestor and was lost in the 
descendant (alO_dOO or a01_dOO) and (vi) only one gene was present in the 
ancestor and was lost in the descendant while the other gene was gained (alO_d01 
or a01_dlO) (Fig. 2b). 

For gene A, we also calculated five fractions that reflect evolutionary asymmetry 
between A and B,/i: all_d01/(all_d01 + all_dlO),/2: a01_dll/(a01_dll + 
al0_dll),/3: a01_d01/(a01_01 +alO_dlO),/4: a01_dll/(a01_dll + aOO_dlO) and 
/s: a01_dl l^[a01_dl 1 - aOO_dlO + l]/[a01_dl 1 + aOO_dlO] . 

Evolutionary information to predict asymmetry of gene pairs. First, we 
reconstructed ancestral states for 2,400 directionally coupled enzyme pairs (func- 
tional asymmetry; A^B) in the yeast metabolic network^^ and generated the 11 
evolutionary variables. Directional coupling was found by constraining the reaction 
flux of one to a finite value followed by minimizing and maximizing another, and 
vice versa^"^. A ^ B is found when the minimum flux through A is zero while the 
maximum is a finite value (when constraining B to a finite flux), and the minimum 
and maximum flux through B is unequal to zero when A is constrained to a finite 
value. Thus, the activity of A depends on the activity of B, but not the reverse 
(A^B). Each of the 11 evolutionary variables is a predictor for functional 
asymmetry between an enzyme pair (A B) as the 1 1 variables of a dependent 
A differ from those of an independent B. More specifically, for evolutionary events 
(i) and (ii), A is expected to occur less often than the independent B. For the other 
four evolutionary events, A is expected to occur more often than the independent 
B. Furthermore, we expect /i,/2, /a and/4 to be larger than 0.5, and/5 to be larger 
than 0 for asymmetric functional relationships (Supplementary Fig. SI). 

Given the fact that these 1 1 variables are not independent from each other, we 
used a Tree Augmented Naive Bayes (TAN) classifier to integrate these predictors. 
The TAN classifier relaxes the assumption of independence of input variables. The 
classifier is trained on the 2,400 directionally coupled enzyme pairs using WEKA^^. 
The classifier estimates conditional probabilities of one gene being independent 
(ai) or dependent (a2), where ai + a2 = 1. The gene is predicted to be either 
independent (if ai > (X2) or dependent (if ai < (X2) with a conditional probability (a), 
where a = max (ai, 0(2). To predict the functional relationship in protein complexes 
by this classifier, we first reconstructed ancestral states for 8,711 protein pairs 
(A-B) and generated the 1 1 evolutionary variables for A and B. The functional 
relationship of A-B is determined as follows: (i) the classifier generates class 
predictions for A with a probability estimate (a); (ii) identifying predictions with 
high confidence by a cutoff on a (any prediction where a is smaller than the cutoff 
remains unclassified); (iii) protein B is predicted in the same way; and (iv) 
combining the classification result of A and B, A-B is predicted as functionally 
asymmetric when A is dependent and B is independent or vice versa. 
Otherwise, there is no evidence for functional asymmetry and the pair becomes 
unclassified (Fig. 2c). 

Gene essentiality. Essentiality data were obtained from MIPS database'^^. If a gene 
is annotated as both essential and non-essential in different sources, the essentiality 
is assigned according to the majority rule. Otherwise, a gene was marked as 
ambiguous. 



For predicted asymmetric pairs (A^B), we counted cases where only B is 
essential (mqi) and cases where only A is essential {riio). For symmetric pairs (A-B), 
it is expected that Mqi is equal to Hiq. To test against the null hypothesis of no 
relationship between predicted functional asymmetry and asymmetry in essenti- 
ality, we subjected the 2x2 contingency table of essential/non-essential versus 
symmetric/asymmetric to a one-tailed Fisher's exact test. 

Growth defect of single-gene knockouts in rich medium. Growth defect of 
single-gene knockouts was obtained from two studies^^'^^ which cover 75% of all 
genes in S. cerevisiae. A growth defect was considered as substantial if a gene 
knockout causes a growth defect > 10% (ref. 4). 

For predicted asymmetric pairs with two non-essential genes (A^B), we 
counted cases where knockout of B causes a more severe growth defect (rioi) and 
cases where knockout of A causes a more severe growth defect {nio). For functional 
symmetric pairs (A-B), it is expected that Hqi is equal to Hiq- To test against the 
null hypothesis of no relationship between predicted functional asymmetry and 
asymmetry in growth defect, we subjected the 2x2 contingency table of noi/riio 
versus symmetric/asymmetric to a one-tailed Fisher's exact test. 

Negative genetic interactions in the metabolic network. We obtained fan-in 
motif A-C pairs for metabolic reactions (Fig. Ic) by using flux coupling between 
enzymes within the genome-scale metabolic network of S. cerevisiae (see above). 
The fan- in motif A-C pairs have two characteristics: (i) A and C both depend on a 
third essential protein B (A ^ B and C ^ B) and (ii) the A-C pair is uncoupled. For 
experimental validation, we used an empirical genetic interaction profile from 
Szappanos et al. (obtained from the Supplementary Information^^). If no 
significant genetic information was available for a pair of genes, we determined it as 
not measured and excluded it from the analysis. 

To test against the null hypothesis of no enrichment of negative genetic 
interactions in fan-in motif A-C pairs, we summarized the motif A-C/non-motif 
A-C pairs versus with/ without negative genetic interaction in a 2 x 2 contingency 
table and subjected it to a one-tailed Fisher's exact test. 

Negative genetic interactions in protein complexes. We obtained A-C pairs 
with evolutionary fan-in motifs (Fig. la) from asymmetric functional relationships 
predicted by the TAN model (Fig. 2c). The evolutionary fan-in motif is char- 
acterized by (i) A and C depend on B, but not vice versa and (ii) A does not depend 
on C and vice versa. To ensure A-C is not an asymmetric pair, we applied the 
cutoff for significant functional asymmetry to a > = 0.7. At this cutoff, the correct 
classification rate increases the most at the smallest cost of samples size 
(Supplementary Fig. S4). 

Genetic interaction data were obtained from BioGRID^^ As BioGRID does not 
specify whether the genetic interaction of a gene pair has been measured, we 
integrated the original data from nine high-throughput data sets^^"^^, to generate a 
list of measured genetic interactions (either positive, negative or no interaction). If 
a pair of genes has both a negative and a positive genetic interaction, the genetic 
interaction was assigned according to the majority rule. 

To test against the null hypothesis of no enrichment of negative genetic 
interactions in pairs with a fan-in motif, we summarized motif A-C/non-motif 
A-C versus with/ without negative genetic interaction in a 2 x 2 contingency 
table and tested using a one-tailed Fisher's exact test. 

Prediction of cancer drug targets. We mapped predicted negative genetic 
interactions from yeast to human via orthology from STRING7.0 (ref 29). To 
retrieve drug targets, we first mapped cancer gene information from the Cancer 
Genome Project^^ to the predicted genetic interactions to extract predictions 
involving one oncogene or tumour- suppressor gene. Then we mapped gene 
essentiality measurements in cancer from Marcotte et al.^^ to the predictions. In 
addition, non-essentiality of the drug targets in normal cells/tissues was examined 
by literature mining covering M. Musculus, D. melanogaster or D. rerio. 

To predict targets directly from human protein complexes, we trained our 
model on functional asymmetric enzyme pairs from the human genome- scale 
metabolic network^^ via Flux Coupling Analysis (see also above for yeast and 
Burgard et aO^). Subsequently, we used experimentally determined protein 
complexes-^"*'-^^ to predict human- specific negative genetic interactions via the 
evolutionary fan-in motifs. Finally, we examined gene essentiality among predicted 
drug targets in cancer and normal cells/tissues as outlined above. 
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